Explorative data visualization#
2023.05.09, 2023.05.16
In this tutorial, we will show how to use tools in the scientific Python ecosystem for interactive and explorative data visualization.
Goals#
Interactively examine a satellite image using rioxarray and hvplot.
Interactively slice the image using the ipyleaflet map.
Installation#
You will need to install rioxarray, hvplot’s xarray module, and ipyleaflet if you intend to run this notebook on the Callysto Hub. We will use rioxarray and hvplot for Goal 1 and ipyleaflet for Goal 2 (along with pyproj, which Callysto already has).
!pip install rioxarray hvplot ipyleaflet
Goal 1 procedure#
Let’s import the necessary modules:
import rioxarray
import hvplot.xarray
Note
rioxarray is an experimental package that extends xarray’s capability by adding the rasterio data accessor.
hvplot#
By Holoviz contributors, BSD-3-Clause license
Hvplot is a high-level plotting API in the Python ecosystem. It works with many popular data libraries such as (geo)pandas and xarray (which we’ll see in this and the next hand-on topics) and visualize data using various plotting libraries based on user’s choice.
Load the data#
Let’s use this Sentinel-2 image hosted on the AWS open data repository:
image_url = 'https://sentinel-cogs.s3.us-west-2.amazonaws.com/sentinel-s2-l2a-cogs/51/R/UH/2023/3/S2B_51RUH_20230311_0_L2A/TCI.tif'
The full asset associated with this true-color image (TCI) can be accessed using this STAC browser. You can see that the true-color image has a label called COG, which stands for cloud-optimized geotiff.
Now we use rioxarray to fetch the data without downloading it:
# da stands for data array
da = rioxarray.open_rasterio(image_url)
da
<xarray.DataArray (band: 3, y: 10980, x: 10980)>
[361681200 values with dtype=uint8]
Coordinates:
* band (band) int64 1 2 3
* x (x) float64 3e+05 3e+05 3e+05 ... 4.098e+05 4.098e+05 4.098e+05
* y (y) float64 2.8e+06 2.8e+06 2.8e+06 ... 2.69e+06 2.69e+06
spatial_ref int64 0
Attributes:
AREA_OR_POINT: Area
OVR_RESAMPLING_ALG: AVERAGE
_FillValue: 0
scale_factor: 1.0
add_offset: 0.0This gridded file is now represented using the xarray data structure. For more geospatial information, you can call the rasterio accessor’s attributes and functions, such as:
print(f"Grid width: {da.rio.width} pixels")
print(f"Grid height: {da.rio.height} pixels")
print(f"Geospatial transform (m): \n{da.rio.transform()}")
Grid width: 10980 pixels
Grid height: 10980 pixels
Geospatial transform (m):
| 10.00, 0.00, 300000.00|
| 0.00,-10.00, 2800020.00|
| 0.00, 0.00, 1.00|
You probably won’t feel you have downloaded any data so far, despite that you have seen the metadata shown above. This is why a cloud-optimized geotiff is powerful for cloud computing – we will only download the minimum necessary part of data. By the way, this TCI.tif file has a size of 276 MB (and you can check that by a manual download), so downloading the full dataset would have taken tens of seconds or so.
We will talk more about cloud-optimized geotiffs during the next hands-on topic (open data sets in remote sensing) and the corresponding lectures.
Interactive visualization#
This raster image has ~120 million pixels, and it would take a lot of time for any visualization tools to read them all and render it on your screen. I have commented out this line, but feel free to try it if you don’t mind waiting:
# da.hvplot.image(x='x', y='y', data_aspect=1)
I have examined the data and decided to have a smaller slice for furthur visualization:
# Here x and y slices are based on the UTM coordinates.
da_subset = da.sel(x=slice(320000, 330000), y=slice(2780000, 2770000))
Now it’s time to visualize the cropped data using hvplot:
da_subset.hvplot.image(x='x', y='y', data_aspect=1)
Note that you can switch the band view by sliding the bar in the right.
The rectangular blocks in the image are buildings and runways at the Taoyuan airport. You can better recognize it on the True color image:
da_subset.hvplot.rgb(x='x', y='y', data_aspect=1, bands='band')
Some questions for you!#
Can you invert the colormap so bright colors represent high reflectance, which will be easier for readers to intepret?
By reading the hvplot gridded data tutorial, can you make a line plot with three lines, each representing the DN value of a specific band along the x coordinate at a fixed y coordinate (say
y = 2774000)? Like this:What band number is Red? How about Green and Blue? How do you know that?
Show code cell source
# Check here for question 1! https://holoviews.org/user_guide/Colormaps.html
da_subset.sel(y=slice(2774000, 2773960)).hvplot.line(x='x', by='band', groupby='y')
Goal 2 procedure#
In the previous section, we select a sub-area of the image by giving the boundary UTM coordinates. Here we will try a more intuitive and interactive method using ipyleaflet for data slicing.
Let’s import additional modules first:
from ipyleaflet import Map, basemaps, basemap_to_tiles, Rectangle, DrawControl
from pyproj import Transformer
ipyleaflet#
By Volodymyr Agafonkin, BSD-2-Clause license
ipyleaflet is a Python API for leaflet, a JavaScript-based library for web maps. Ipyleaflet uses the Notebook UI to render web maps, which allows users to interactively explore the geospatial content with the help of extra functionality from IPython, such as the widgets.
Make a map showing the full extent of the image#
To show the extent of the image on a web map, we need to extract this information from the source file:
bounds = da.rio.bounds()
print(bounds)
print(f'Coordinates of the lower left corner: {bounds[:2]}')
print(f'Coordinates of the upper right corner: {bounds[2:]}')
(300000.0, 2690220.0, 409800.0, 2800020.0)
Coordinates of the lower left corner: (300000.0, 2690220.0)
Coordinates of the upper right corner: (409800.0, 2800020.0)
These coordinates are based on the coordinate reference system (CRS) of the source file, which is EPSG 32651 (WGS 84 / UTM zone 51N). The CRS information can be retrieved by the following variable:
da.rio.crs
CRS.from_epsg(32651)
Note
Take a look at here for additional explanation about EPSG.
To show the extent on a web map, we have to transform the coordinates from EPSG 32651 to Lat/Lon (a.k.a. EPSG 4326). We can achieve this by the code below:
# Construct a coordinate transformer from dataset's CRS to 4326 (using pyproj)
transformer = Transformer.from_crs(da.rio.crs, "epsg:4326")
# Transform the coordinates
coords = [transformer.transform(x, y) for x,y in [bounds[:2], bounds[2:]]]
coords
[(24.312301875374363, 121.02911850575056),
(25.314010096875364, 122.10386391236358)]
Now we can start from the previous sample code and construct the webmap. Note that I added a few more lines for drawing a rectangle on the map:
# Map center set to the lower left corner of the image
center = coords[0]
# Again, we use OpenStreetMap as the basemap.
m = Map(
basemap=basemap_to_tiles(basemaps.OpenStreetMap.Mapnik),
center=center,
zoom=8
)
# Add the image extent
rectangle = Rectangle(bounds=coords)
m.add_layer(rectangle)
# Enable rectangle draw (and disable the other draws)
draw_control = DrawControl(polyline={}, polygon={}, circlemarker={})
draw_control.rectangle = {
"shapeOptions": {
"fillOpacity": 0.5
}
}
m.add_control(draw_control)
# Display the map
m
Note
For technical details about DrawControl, see this documentation.
Here’s a task for you#
Draw a rectangle on the map using the tool bar in the left! We will use this rectangle as the bounding box to slice the image and display the data within the box.
Display the image within the selected area#
First, get the coordinates of the new rectangle by the following code:
rec_coords = draw_control.last_draw.get('geometry')
rec_coords = rec_coords['coordinates']
xy1 = rec_coords[0][0] # lower left
xy2 = rec_coords[0][2] # upper right
print(f"Lower left: {xy1} \nUpper right: {xy2}")
Lower left: [121.179529, 25.057011]
Upper right: [121.303785, 25.126658]
Now we are ready to use another transformer to convert the coordinates back to the data CRS:
transformer2 = Transformer.from_crs("epsg:4326", da.rio.crs)
slicing_coords = [transformer2.transform(lat, lon) for lon, lat in [xy1, xy2]]
slicing_coords
[(316361.7036834039, 2772496.3688340243),
(328995.23770548543, 2780047.5615810486)]
Use these numbers to slice the data again and finally plot the image within the new bounding box!
da_subset = da.sel(x=slice(slicing_coords[0][0], slicing_coords[1][0]), y=slice(slicing_coords[1][1], slicing_coords[0][1]))
da_subset
<xarray.DataArray (band: 3, y: 755, x: 1264)>
[2862960 values with dtype=uint8]
Coordinates:
* band (band) int64 1 2 3
* x (x) float64 3.164e+05 3.164e+05 3.164e+05 ... 3.29e+05 3.29e+05
* y (y) float64 2.78e+06 2.78e+06 2.78e+06 ... 2.773e+06 2.773e+06
spatial_ref int64 0
Attributes:
AREA_OR_POINT: Area
OVR_RESAMPLING_ALG: AVERAGE
_FillValue: 0
scale_factor: 1.0
add_offset: 0.0da_subset.hvplot.image(x='x', y='y', data_aspect=1, cmap='gray')
Every time you make a new rectangle selection on the map, you can repeat the slicing steps to extract and show the selected content.